English Emotional Voice Conversion Using StarGAN Model

نویسندگان

چکیده

The StarGANv2-VC model is a many-to-many non-parallel generative adversarial network (GAN) voice conversion (VC) that has proven effective in style tasks. This study aimed to investigate the scalability and diversity of for English emotional (EVC) across different speakers emotions. We carried out five experiments using an Emotional Speech Database (ESD), comprising single speaker-multi-emotion experiment, multi-speakers-multi-emotions experiment (gender-dependent), (gender-independent). also assessed effect training set size compared performance with CycleGAN model. Our found accurately converted pitch all four emotions (neutral, happy, sad, angry). However, model’s efficiency converting multi-emotions multi-speakers was not as high its multi-speakers. Further research needed this area. objectively quality speech Mel-frequency cepstral distortion (MCD) root-mean-square error (RMSE) spectrum prosody, respectively. Additionally, we conducted cross-emotion recognition convolutional recurrent neural (CRNN).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Emotional Voice Conversion for Mandarin using Tone Nucleus Model – Small Corpus and High Efficiency

The GMM-based spectral conversion techniques were applied to emotion conversion but it was found that spectral transformation alone is not sufficient for conveying the required target emotion. In this paper, we adopt the tone nucleus model to carry the most important information of tones and represent F0 contour for Mandarin speech. And then tone nucleus part is converted to emotional speech fr...

متن کامل

Voice Conversion Using Articulatory Features

The aim of voice conversion is to transform an utterance spoken by an arbitrary (source) speaker to that of a specific (target) speaker. Text-to-speech (TTS), speech-to-speech translation, mimicry generation and human-machine interaction systems are among the numerous applications which can be greatly benefited by having a voice conversion module. Generally voice conversion systems require para...

متن کامل

Voice conversion between UK and US accented English

This paper presents an HMM-based method and experimental results for voice conversion between UK and US accented English. Phonetic-tree based tiedstate triphone HMMs are used to map equivalent states of the source and target spectra. Then a linear transformation method is incorporated to estimate the most likely target spectra for a given input. The mapping is between two different sets of phon...

متن کامل

GMM-based voice conversion applied to emotional speech synthesis

Voice conversion method is applied to synthesizing emotional speech from standard reading (neutral) speech. Pairs of neutral speech and emotional speech are used for conversion rule training. The conversion adopts GMM (Gaussian Mixture Model) with DFW (Dynamic Frequency Warping). We also adopt STRAIGHT, the high-quality speech analysis-synthesis algorithm. As conversion target emotions, (Hot) a...

متن کامل

Syllabic Pitch Tuning for Neutral-to-emotional Voice Conversion

Prosody plays an important role in neutral-to-emotional voice conversion. Prosodic features like pitch are usually estimated and altered at a segmental level based on short windowing of speech signal (where the signal is expected to be quasi-stationary). This results in a frame-wise change of acoustical parameters for synthesizing emotionalized speech. In order to convert a neutral speech to an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3292003